Multi-Query Optimization in MapReduce Framework

نویسندگان

Guoping Wang

Chee Yong Chan

چکیده

MapReduce has recently emerged as a new paradigm for large-scale data analysis due to its high scalability, finegrained fault tolerance and easy programming model. Since different jobs often share similar work (e.g., several jobs scan the same input file or produce the same map output), there are many opportunities to optimize the performance for a batch of jobs. In this paper, we propose two new techniques for multi-job optimization in the MapReduce framework. The first is a generalized grouping technique (which generalizes the recently proposed MRShare technique) that merges multiple jobs into a single job thereby enabling the merged jobs to share both the scan of the input file as well as the communication of the common map output. The second is a materialization technique that enables multiple jobs to share both the scan of the input file as well as the communication of the common map output via partial materialization of the map output of some jobs (in the map and/or reduce phase). Our second contribution is the proposal of a new optimization algorithm that given an input batch of jobs, produces an optimal plan by a judicious partitioning of the jobs into groups and an optimal assignment of the processing technique to each group. Our experimental results on Hadoop demonstrate that our new approach significantly outperforms the state-of-the-art technique, MRShare, by up to 107%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simultaneous Processing of Multi-Skyline Queries with MapReduce

With rapid increase of the number of applications as well as the sizes of data, multi-query processing on the MapReduce framework has gained much attention. Meanwhile, there have been much interest in skyline query processing due to its power of multi-criteria decision making and analysis. Recently, there have been attempts to optimize multi-query processing in MapReduce. However, they are not ...

متن کامل

A Lightweight Evaluation Framework for Table Layouts in MapReduce Based Query Systems

Table layout determines the way how the relational rowcolumn data values are organized and stored. In recent years, considerable candidates have been developed in MapReduce based query systems; they differ on storage space utilization, data loading time, query performance and so on. In most time, users are confronted with the problem of choosing the comprehensive optimum table layout given the ...

متن کامل

Comparative Study of Multi-query Optimization Techniques using Shared Predicate-based for Big Data

Big data analytical systems, such as MapReduce, have become main issues for many enterprises and research groups. Currently, multi-query which translated into MapReduce jobs is submitted repeatedly with similar tasks. So, exploiting these similar tasks can offer possibilities to avoid repeated computations of MapReduce jobs. Therefore, many researches have addressed the sharing opportunity to o...

متن کامل

MRShare: Sharing Across Multiple Queries in MapReduce

Large-scale data analysis lies in the core of modern enterprises and scientific research. With the emergence of cloud computing, the use of an analytical query processing infrastructure (e.g., Amazon EC2) can be directly mapped to monetary value. MapReduce has been a popular framework in the context of cloud computing, designed to serve long running queries (jobs) which can be processed in batc...

متن کامل

Multidimensional Aggregation Process in Cloud Computing System

This paper presents multidimensional aggregation query processing algorithm in cloud computing system. The existing cloud computing research work in the MapReduce calculation framework lacks effective support to the aggregation of multi-dimensional data. On the other hand, the use of MapReduce computing framework needs to start large computing nodes, and costs huge amounts of energy. For the ab...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

PVLDB

دوره 7 شماره

صفحات -

تاریخ انتشار 2013

Multi-Query Optimization in MapReduce Framework

نویسندگان

چکیده

منابع مشابه

Simultaneous Processing of Multi-Skyline Queries with MapReduce

A Lightweight Evaluation Framework for Table Layouts in MapReduce Based Query Systems

Comparative Study of Multi-query Optimization Techniques using Shared Predicate-based for Big Data

MRShare: Sharing Across Multiple Queries in MapReduce

Multidimensional Aggregation Process in Cloud Computing System

عنوان ژورنال:

اشتراک گذاری